UEval==1.0

tb run -d UEval==1.0 -a "<agent>" -m "<model>"

Loading tasks...

Website template modified from https://www.tbench.ai/.